Predicting Adversarial Examples with High Confidence
نویسندگان
چکیده
It has been suggested that adversarial examples cause deep learning models to make incorrect predictions with high confidence. In this work, we take the opposite stance: an overly confident model is more likely to be vulnerable to adversarial examples. This work is one of the most proactive approaches taken to date, as we link robustness with non-calibrated model confidence on noisy images, providing a data-augmentationfree path forward. The adversarial examples phenomenon is most easily explained by the trend of increasing non-regularized model capacity, while the diversity and number of samples in common datasets has remained flat. Test accuracy has incorrectly been associated with true generalization performance, ignoring that training and test splits are often extremely similar in terms of the overall representation space. The transferability property of adversarial examples was previously used as evidence against overfitting arguments, a perceived random effect, but overfitting is not always random.
منابع مشابه
Adversarial Phenomenon in the Eyes of Bayesian Deep Learning
Deep Learning models are vulnerable to adversarial examples, i.e. images obtained via deliberate imperceptible perturbations, such that the model misclassifies them with high confidence. However, class confidence by itself is an incomplete picture of uncertainty. We therefore use principled Bayesian methods to capture model uncertainty in prediction for observing adversarial misclassification. ...
متن کاملOn the (Statistical) Detection of Adversarial Examples
Machine Learning (ML) models are applied in a variety of tasks such as network intrusion detection or malware classification. Yet, these models are vulnerable to a class of malicious inputs known as adversarial examples. These are slightly perturbed inputs that are classified incorrectly by the ML model. The mitigation of these adversarial inputs remains an open problem. As a step towards a mod...
متن کاملOn the Limitation of Local Intrinsic Dimensionality for Characterizing the Subspaces of Adversarial Examples
Understanding and characterizing the subspaces of adversarial examples aid in studying the robustness of deep neural networks (DNNs) to adversarial perturbations. Very recently, Ma et al. (2018) proposed to use local intrinsic dimensionality (LID) in layer-wise hidden representations of DNNs to study adversarial subspaces. It was demonstrated that LID can be used to characterize the adversarial...
متن کاملDefensive Collaborative Multi-task Training
Deep neural network (DNNs) has shown impressive performance on hard perceptual problems. However, researchers found that DNNbased systems are vulnerable to adversarial examples which contain specially crafted humans-imperceptible perturbations. Such perturbations cause DNN-based systems to mis-classify the adversarial examples, with potentially disastrous consequences where safety or security i...
متن کاملExplaining and Harnessing Adversarial Examples
Several machine learning models, including neural networks, consistently misclassify adversarial examples—inputs formed by applying small but intentionally worst-case perturbations to examples from the dataset, such that the perturbed input results in the model outputting an incorrect answer with high confidence. Early attempts at explaining this phenomenon focused on nonlinearity and overfitti...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1802.04457 شماره
صفحات -
تاریخ انتشار 2018